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Abstract: An integration of the pattern of genome-wide inter-site associations with evolutionary forces is important for 
gaining insights into the genomic evolution in natural or artificial populations. Here, we assess the inter-site correlation 
blocks and their distributions along chromosomes. A correlation block is broadly termed as the DNA segment within 
which strong correlations exist between genetic diversities at any two sites. We bring together the population genetic 
structure and the genomic diversity structure that have been independently built on different scales and synthesize the ex- 
isting theories and methods for characterizing genomic structure at the population level. We discuss how population struc- 
ture could shape correlation blocks and their patterns within and between populations. Effects of evolutionary forces (se- 
lection, migration, genetic drift, and mutation) on the pattern of genome-wide correlation blocks are discussed. In eu- 
karyote organisms, we briefly discuss the associations between the pattern of correlation blocks and genome assembly 
features in eukaryote organisms, including the impacts of multigene family, the perturbation of transposable elements, and 
the repetitive nongenic sequences and GC-rich isochores. Our reviews suggest that the observable pattern of correlation 
blocks can refine our understanding of the ecological and evolutionary processes underlying the genomic evolution at the 
population level. 
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INTRODUCTION 

Determining how much genetic diversity exists in a spe- 
cies and explaining how these diversities coexist in terms of 
its origin, organization, and maintenance, are of paramount 
importance in the study of population genetic structure. The 
analysis of genetic diversity often assumes random recombi- 
nation of genes at different loci. In such case, the single- 
locus estimates of genetic diversity and their average across 
loci are adequate for describing the genetic diversity pattern. 
However, many selection and non-selective evolutionary 
forces could cause non-random allelic association among 
loci. This proposes the necessity to study the joint effects of 
diversity at multiple loci, i.e. genomic diversity, and the in- 
ter-site associations along chromosomes, i.e. the structure of 
genomic diversity, on the basis of the structured populations. 

One approach to assess the structure of population ge- 
nomic diversity is to measure the association of genetic di- 
versities among linked sites. The DNA segment within 
which strong (or significant) correlations of genetic diversity 
exist among linked sites is broadly termed as a correlation 
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block. For instance, the well-known gametic linkage dise- 
quilibrium (LD) is the correlation between allele frequencies 
among sites and the correlation block refers to the haplotype 
block [1, 2]. Here, the meaning of a correlation block is ex- 
tended. It can refer to the DNA segment within which the 
strong correlations exist between heterozygosities (H e , s) at 
linked sites within individual subpopulations, or between 
population differentiation coefficients (F st 's) at linked sites 
on the same chromosome, or between genetic statistics other 
than the above variables. Compared with gametic LD, the 
correlations between H e 's or between F st 's among linked 
sites are higher-order associations. One important difference 
between the hapolotype block and higher-order correlation 
block is that we can infer allele linkage phase in the haplo- 
type block. The correlation between H e 's or F sl 's does not 
require the information on linkage phase. Their commonality 
is that both correlations suffer from sampling errors. The 
threshold for determining a block size could vary with the 
type of correlation block although different blocks might be 
partially or completely overlapped on the same chromosome 
[3]. For instance, the logarithm of odds (LOD) is used to 
determine the square of standardized gametic LD blocks, 

, different from the criteria for determining D' blocks [4] . 
A correlation block itself is a pure statistical concept and its 
biological meaning is activated only when linked to effects 
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of evolutionary forces. Partial overlapping of different types 
of blocks on the same chromosomal regions may arise from 
the effects of distinct evolutionary processes. 

The significance of examining the pattern of correlation 
blocks (the size, the abundance, and the distribution) is mul- 
tifold when linked to the effects of evolutionary forces. First, 
this pattern can gain insights into evolutionary divergences 
among different chromosomal regions. The chromosomal 
regions with large block sizes might have experienced evolu- 
tionary processes different from the regions with small block 
sizes, such as the heterogeneity in selection strength, recom- 
bination rate, and mutation rate. Second, the pattern may 
signal regional variation in co-evolution at the population 
level when positive or negative correlation blocks reveal 
distinct processes. Third, the pattern can facilitate genetic 
improvement of quantitative traits when quantitative trait 
nucleotides (QTN) [5] are mapped within the correlation 
blocks. The block-based approach is easier to manipulate 
than the individual single nucleotide polymorphisms (SNPs). 

Current empirical studies on correlation blocks mainly 
focus on the haplotypic block, such as in the HapMap human 
genome project [6, 7], and few studies examine other types 
of blocks and compare these to haplotype blocks. There have 
been few studies that relate correlation blocks with popula- 
tion genetic structure [8]. The purpose of this synthetic re- 
view is to discuss the importance of studying the pattern of 
correlation blocks in structured populations, complementary 
to recent reviews on the population genomics where the 
structure of genomic diversity has not been emphasized [2, 
9-11]. Here, we discuss that the pattern of correlation blocks 
along chromosomes is informative for our inferences on the 
underlying evolutionary processes. Fig. (1) simply illustrates 
how evolutionary forces could shape the structure of ge- 



nomic diversity in natural populations. Such structure of 
genomic diversity could vary with populations and organ- 
isms. 

We review the impacts of population genetic structure on 
the pattern of genome-wide correlation blocks from the theo- 
retical perspective, focusing on the analytical methods that 
describe this structure and relating the correlation block pat- 
tern to evolutionary processes. Previous studies rarely con- 
nect conventional population genetic structure with the pat- 
tern of genomic diversity, mainly due to the long-term de- 
velopment of two subjects at very different scales and the 
unavailability of a large number of sequenced genomes. As- 
pects of genomic evolution that have been evaluated [12] are 
not considered here, including LD mapping and some statis- 
tical issues for outlier detections [13-15]. Our synthesis is 
different from previous reviews on genomic structure from a 
variety of perspectives [16, 17]. Here, we discuss the pattern 
of correlation blocks within and between populations. We 
then deliberate on the possible relations between the pattern 
of correlation blocks and the genome architecture in eukary- 
otic organisms, including the effects of multigene families, 
transposable elements (TE), and nongenic sequences and GC 
isochores. 

CORRELATION BLOCKS WITHIN POPULATIONS 

Mechanisms for Maintaining Inter-Site Correlations 

Variables for calculating the inter-site correlation blocks 
may refer to those that denote genetic variation within popu- 
lations, such as allele frequency and heterozygosity. The 
biological significance for the inter-site correlations of these 
variables can be activated only when they are associated with 
the evolutionary forces. Mechanisms for maintaining inter- 
site correlations are complicated from the evolutionary per- 
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Mutation and migration increase 
genetic variation; selection and genetic 
drift generally reduce genetic variation. 
Other events are directly or indirectly 
associated with these basic forces. 



All basic forces and nonrandom mating may 
cause linkage disequilibrium (LD); 
recombination reduces LD. 



Migration and genetic drift cause whole genome changes; 
mutation, selection and recombination cause regional genome 
changes, producing a structured pattern of genomic diversity. 



Fig. (1). This diagram illustrates the effects of basic evolutionary forces (selection, mutation, migration, and genetic drift) on genomic diver- 
sity in natural populations. The pattern of genomic diversity along chromosomes can be assessed when multiple sites and their linkage phases 
along the chromosomes are assessed simultaneously, which turns the conventional population genetics studies into a large genome scale. 



Correlation Blocks and Population Structure 

spectives. For a given pair of linked SNP sites, the correla- 
tion can be one of three combination types: selective-by- 
selective, selective-by-neutral, and neutral-by-neutral sites. 
Correlation between linked selective sites can result from a 
variety of selection systems. The interaction for the selec- 
tive-by-selective combination depends on the type of selec- 
tion system at individual sites (e.g., directional, heterozygous 
advantage/disadvantage, antagonistic, and frequency- 
dependent selection). As the number of combinations in- 
creases, it becomes progressively difficult to reveal the rela- 
tive contributions of differently combined selection systems. 
For instance, the distinction becomes difficult even for dif- 
ferent types of balancing selection [12, 18]. The correlation 
between selective sites can be enhanced in structured popula- 
tions where immigration facilitates their LDs [19, 20]. Het- 
erogeneity in selection systems in different regions on the 
same chromosomes facilitates different extents of inter-site 
correlations. 

Correlation between linked selective and neutral sites is 
also complex, especially when multiple selective sites jointly 
change the linked neutral sites [21]. The indirect effects 
come either from the background selection owing to the 
deleterious mutation at the selective sites [22] or from the 
hitchhiking effects owing to the advantageous mutation at 
the selective sites [23]. The transient correlation between 
selective and neutral sites can be reinforced where immigra- 
tion is present, as implied from the results in the cytonuclear 
system [21]. Heterogeneity in background selection or in 
genetic hitchhiking effects in different regions of the same 
chromosome enhances transient blocks with varying sizes. 

Correlations between linked neutral sites are often tran- 
sient due to the effects of recombination and are related to 
the number and the length of neutral DNA sequence seg- 
ments. The transient correlation between neutral sites can 
arise from genetic drift for the populations with a short his- 
tory [24] and/or from the effects of immigration. Introns with 
various secondary structures (e.g., Groups I and II introns) 
involve tight linkage between distant sites. The chromosomal 
regions with the consecutive neutral sites, such as some non- 
coding or intron DNA sequences regions, eventually form 
the intervals that flank different correlation blocks. For in- 
stance, the average length of introns in human genomes is 
4.66kb which generates an enormous number of tiny islands 
of exons with an average length of 0.15kb [17, p.49]. This 
implies that on average, the block sizes are probably smaller 
in the human genomes than in species with smaller sizes of 
introns, such as in Caenorhabditis and Drosophila [17, 
pp.49-50]. 

Statistically, a significant gametic LD is the basis for 
maintaining inter-site correlations since higher-order inter- 
site associations are the function of lower-order associations 
[2, 13, 25-27]. The distribution pattern of gametic LD along 
chromosomes is associated with the heterogeneous recombi- 
nation rates [28-31] which generate the inter-site correlation 
blocks of different sizes along the chromosomes. Evolution- 
ary mechanisms for maintaining gametic LD can directly or 
indirectly affect higher-order inter-site associations although 
the reverse relationships are not true. Higher-order inter-site 
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correlations can arise from the interactions other than 
gametic LD, such as zygotic epistasis between linked sites. 
There is no an one-to-one corresponding relationship in 
mechanism between lower- and higher-order LDs. 

Methods for Measuring Correlation Blocks 

Biologically, mapping correlation blocks is different 
from mapping genetic variation at individual sites since the 
former reflects the inter-site associations while the latter 
does not. For instance, an IBD (identity by descent) map 
describes the diversity at individual sites and cannot tell the 
co-evolution process between the linked sites [32, 33]. Cor- 
relation block maps can reveal the pattern of co-evolutionary 
variations along the chromosomes. For instance, methods for 
estimating the correlations of pairwise relatedness coeffi- 
cients at linked sites [33] and for estimating the correlations 
of non-allele descents [34] can be applied to constructing the 
inter-site association maps. Other methods, such as the 
wavelet analysis [29] and the joint estimates of multilocus 
inbreeding coefficients [35, 36], can also be used. 

One common measure of inter-site association is the 

square of standardized gametic LD, ) that describes the 

correlation of allele frequencies between linked sites [28,37]. 
This statistics is different from the correlation of pair-wise 
relatedness or the correlation of heterozygosities, given that 
different components of the genetic variation are used [38]. 
Information on either inter-site IBD or within-site IBD is not 

singled out in the gametic LD or mapping. Only informa- 
tion on the identity in state (IIS) is in use, even when IIS is 
the function of IBD [33, 34, 39]. Their resulting maps for 
correlation blocks along chromosomes are different due to 
their different sensitiveness to the effects of recombination 
that reduces the probability of inter-site IBD for a given pair 
of linked nuclear sites. The correlation block map for non- 
allele- or allele- descent measures can be different from the 
gametic LD or r D blocks in signaling the co-evolution proc- 
ess due to natural/artificial factors. This can occur in the 
large population with a long history where only small de- 
scent blocks survive, contrast to the population with a short 
history where large descent correlation blocks exist. 

Correlation of heterozygosities describes an alternative 
pattern of genomic diversity although zygotic LD is a com- 
plicated function of gametic LD [25, 27]. To examine their 
differences, we synthesize the existing theories to calculate 
the correlation of heterozygosities in a solely neutral process. 
Consider two diallelic neutral SNP sites with the recombina- 
tion rate r in a random mating population of effective size 
N e . Let A, and A 2 be the alleles at site A, with the initial al- 
lele frequencies p A and q A , respectively; B, and B, be the 
alleles at site B, with the initial allele frequencies p B and 
q B , respectively. Let H A and H B be the heterozygosities at 
sites A and B, respectively. The correlation coefficient of 
heterozygosities at generation t , R, , is calculated by 

R t =cov(H A ,H B )/(a H a HB ), (1) 
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where co\(H A , H B ) = E(H A H B ) - E(H A )E(H B ) , in which 
E(H A ), E(H B ), and E(H A H B ) are the expectations of 
heterozygosity at A, B, and both sites, respectively; (J Ha and 

(7 Hb are the variances of heterozygosity at sites A and B, 
respectively. 

The expected heterozygosity at site A is 
E(H A ) = 2p A q A Yi ( p A and cjr A are equal to the averages of 
allele frequencies over all possible outcomes caused by ge- 
netic drift effects), and y, =\-\l2N e . E(H B ) can be ex- 
pressed in a similar way. The variance of heterozygosity at 
site A, g\ a , is calculated by E(H A f -(E(H A )f . Using the 

formulae derived by Robertson [40, pp. 203-206], we can 
obtain 

g 2 h a = 4 PA<?^-PA^r 1 'jy,'-4p A <?^- I p A <?^r2, (2) 

where y 2 =y 1 (l-l/ N e )(l-3/2N e ) . a\ g can be readily 

obtained by replacing subscript A in the above equation with 
subscript B. 

Following Ohta and Kimura [41, p. 52], the expected 
frequency of double heterozygotes at generation t is 

E(H A H B ) = 

4XC„ [ E 0ffff + \< 3 + 4N „ r + 2X,)D B (\-2p A )(l-2p B )+ Dl ^exp(\t I NJ > 

(3) 



where X. is the constant related to the decaying rate of 
E(H A H B ) , C Hi is the function of A,- [41, p. 52], and D 0 is the 
initial linkage disequilibrium in the population. Fig. (2A) 
shows how R, changes with the time and with the recombina- 
tion rate, indicating that the strong transient correlation 
blocks are present only within a short distance (tightly linked 
sites). Fig. (2B) shows that the transient gametic LD, 
D, = exp(-(2N e r + l)t/(2Ne))D 0 [42], decays faster with 
time within short distances than the transient zygotic 
LD,cov(H A ,H B ) , although gametic LD is greater than zy- 
gotic LD within a short range. The presence of natural or 
artificial selection may lead to the pattern biased from the 
expectations in a pure neutral process. This remains to be 
explored in theory. 

As an example, we compared the structures of zygotic 
and gametic LDs on one human chromosome (Chr.21) from 
CHB-Han Chinese Beijing population. Data were down- 
loaded from ftp://ftp.sanger.ac.uk/pub/hapma3/r3 from the 
Human Genome Project group at the Wellcome Trust Sanger 
Institute. There were 137 individuals in this population and 
18707 SNPs on Chr. 21 (the chromosome with the smallest 
number of SNPs in this population). After removing those 
SNPs with minor allele frequency (MAF) smaller than 0.05, 
15817 SNPs were used for both zygotic and gametic LDs 
analyses. Fig. (3A) shows the pattern of pairwise gametic 
and zygotic LDs with the distance, evidencing that the corre- 
lation of heterozygosity was generally weaker than gametic 
LD. A significant difference existed between the distribution 
of correlation of heterozygosity and the distribution of 
gametic LD (Fig. 3B; Kolmogorov-Smirnov test, p- value 



1 




0 0.1 0.2 0.3 0.4 0.5 



Recombination rate 



0.16 




0 0.1 0.2 0.3 0.4 0.5 
Recombination rate 



A B 

Fig. (2). The change for zygotic and gametic LDs between linked neutral sites. A. Correlation of heterozygosities decays with the time meas- 
ured in terms of effective population size (N e ) and with the distance measured in terms of recombination rate. B. A comparison between the 
heterozygosity disequilibrium and the gametic linkage disequilibrium. The result indicates that the gametic LD decays faster than the zygotic 
LD within short distances although the gametic LD is greater than the zygotic LD in magnitude. Calculations are based on synthetic theories, 
Eqs (1) ~ (3). The initial settings are N e =l0, the gametic linkage disequilibrium=0.25, and the allele frequency at each of two diallelic 
sites=0.5. 
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Fig. (3). A. Distribution of pairwise correlations of heterozygosities (red dots), r H , and gametic LD (green dots), r D , with distance on human 
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chromosome 21 in CHB (Han Chinese Beijing) population, indicating that r H collapsed faster than r D with distance. B. Patterns for the 
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empirical cumulative distribution function (e.c.d.f.; green forr^ 's and red ior r H 's ). Kolmogorov-Smirnov test indicated that there was a 
significant difference between and distributions, with p-value < 2. 2x 1CT 16 . 



A 

i hh 1 1 1 — ;; ill ■ i mm i n ■ ■ ■ iii m ii- 

HHH II III! 1 1 II I I W ill lllllll 

9 10 11 12 13 14 15 




43 44 45 46 47 

SNP marker positions in Mbp 



B 

i nn ii i n m i iiiiiiiiiiiiii . 1 i m iii m i min i i ii ■ ii j ■ ii ii iiiiiii m i w i i iiiii i iii nun m i , 

10 15 20 25 30 35 40 45 50 

SNP marker positions in Mbp 
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associations. B. The green color lines represented the positions of SNP markers that were present in the subset of SNPs with strong paiwise 
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gametic LDs ( >0.3) but absent in the subset of SNPs with strong zygotic LDs ( r H >0.3); the red color lines for the reverse case results. 
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15817 SNPs (8.2%) showed strong gametic LDs (r£ >0.3 ) 
but weak heterozygosity correlations, but only 11 of the 
15817 SNPs (0.07%) showed the reverse pattern (Fig. 4B). 
However, when highly strong gametic and zygotic associa- 
tions are considered, say >0.9 and r\ >0.9, the same pat- 
tern of correlations was observed between them (data not 
shown here for Chr. 21 in CHB population). Only those 
tightly linked sites maintained strong gametic and zygotic 
LDs. Our analysis of human chromosome 21 clearly shows 
that gametic and zygotic LDs had distinct structures of ge- 
nomic diversity. A further analysis is of interest to map the 
functional meanings of these distinct SNPs in gene expres- 
sions. 

Theories previously used to measure the inter-site struc- 
ture at the sequence level are useful to describe genomic 
structure at the population level, such as in auto-correlation 
and spectral analysis [43-45]. The difference is that the vari- 
ables here refer to the genetic diversities at individual sites 
other than the nucleotide compositions. These will likely 
produce different patterns of genomic diversity along the 
chromosomes, and some of them are probably not related to 
the haplotypic LD block pattern. 

Density Distribution of Correlation Blocks 

One way to summarize the pattern of correlation blocks 
is to look at the density distribution of correlation block 
sizes, similar to the method for describing the distribution of 
nucleotide base composition at the sequence level [43]. This 
can give a general picture about inter-site associations on a 
chromosome. The sizes of correlation blocks could be al- 
tered under the effects of evolutionary forces. Whether the 



density distribution of block sizes is a stable or not remains 
to be studied in theory under the balancing effects of recom- 
bination and other evolutionary forces. 

Fig. (5A) shows the abundance distribution of strong 

pairwise gametic and zygotic LDs ( >0.3 and r# >0.3) on 

the human Chr.21 from the CHB population. This is a nega- 
tive exponential distribution, with a large number of pairwise 
correlations within short distances and a small number of 
correlations within large distances. Fig. (5B) displays the 
density distribution of gametic LD block sizes, measured in 
terms of Lewontin's D' [46], which shows a kind of negative 
exponential distribution. This is probably related to the long- 
time history of human population where the effects of re- 
combination were substantial, leading to a majority of small 
gametic LD blocks. Distribution other than the negative ex- 
ponential kind cannot be excluded under the impacts of evo- 
lutionary forces, such as the non-exponential distribution of 
gametic LD block sizes in domestic dairy and beef cattle 
populations caused by long-term directional artificial selec- 
tion (Li et ai, unpublished data). 

Perspectives 

The outstanding challenge is how to unravel the relative 
effects of evolutionary forces (mutation, migration, selec- 
tion, and drift) in forming the pattern of correlation blocks, 
given the observed block sizes and their distribution pattern. 
If we examine the average correlation block size and its 
variation (e.g., its standard deviation) at the genome-wide 
scale, these evolutionary forces can produce distinct patterns. 
Natural selection and mutation can, on average, bring about 
smaller correlation blocks and a higher variation in block 
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Fig. (5). A. Distribution of pairwise correlations of gametic ( r D >0.3) and zygotic ( r H >0.3) LDs with distance, indicating a negative expo- 
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nential distribution on human chromosome 21 in CHB (Han Chinese Beijing) population. There were 145801 pairwise gametic LDs with r D 

2 

>0.3 and 87411 zygotic LDs with r H >0.3 and the bin size was set as 5Kbp. B. The abundance distribution of gametic LD sizes with distance 

on human chromosome 21 in CHB population. Gametic LDs were measured by Lewontin's D'and results are obtained using HaploView 
[1]. There were 1811 gametic LD blocks ( D' -1.0) and the bin size was set as 2.5kbp. 
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size in a large population with a long history than in a small 
population with a short history. This is because selection and 
mutation can cause regional genetic variation along the 
chromosomes, and a long-time history facilitates the collapse 
of LD due to the effects of recombination. 

Genetic drift and immigration can increase the average 
correlation block size for the population with a short history. 
For the population with a long history, however, the size of 
correlation block produced by genetic drift decays with time, 
and this, on average, results in smaller block sizes. The block 
sizes and their distribution donot reveal immigration effects 
for a single subpopulation because immigration changes the 
whole genome-wide LD (Fig. 1). To infer the effects of im- 
migration and genetic drift, a comparison among populations 
is necessary in terms of average correlation block size and 
distribution variances. The block sizes and their distributions 
vary with the populations of various demographic histories, 
as implied from the comparisons of LD blocks in different 
soybean populations [4]. The expected correlation block is 
greater for a small population with a short history than with a 
long history due to the collapse of LD by recombination rate 
and the effect of genetic drift with time. However, this is 
likely not the case for the effects of immigration whose ef- 
fects can increase the average size of correlation block. 

CORRELATION BLOCKS AMONG POPULATIONS 

Variables for calculating the inter-site correlations among 
populations may be Wright' s F a or other genetic statistics 
(e.g., Nei's genetic distances at individual sites [47]). The 
chromosomal regions with smaller F st 's and larger F a 's at 
linked sites imply their more convergent and divergent evo- 
lution among populations, respectively. Each of these two 
regions may possess positive inter-site F st -correlations. The 
F st -correlation block is hitherto not assessed despite 
F a maps are available in human, cattle, and other organisms 
[48,49]. 

Investigating the inter-site F st correlations is different 
from investigating the inter-site correlations within popula- 
tions. First, a strong positive or negative F st -correlation indi- 
cates that the linked sites undergone similar or different evo- 
lution processes in different populations, respectively. Het- 
erogeneous variation in F st -correlation along chromosomes 
indicates the presence of different effects of evolutionary 
forces. Second, patterns of F sl -correlation blocks are infor- 
mative on genetic conservation at the population level since 
genetic variation within blocks provides redundant informa- 
tion among populations. This aids the block-based approach 
to be more effective in utilizing genome-wide divergences 
among populations in conservation. 

Mechanisms for Maintaining Inter-Site F st -Correlations 

In principle, the process that increases the inter-site LDs 
within populations and the allele frequency differentiation 
among populations at individual sites can facilitate inter-site 
F sl correlations. Statistically, F st -correlation is related to 
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gametic LD within and between populations. The processes 
are very complicated when linked to the effects of evolution- 
ary forces. For a pair of linked selective-by-selective sites, 
synergistic interactions enhance inter-site F st -correlations 
while antagonistic interactions reduce F st -correlations. Dif- 
ferent forms of selection create a potentially large number of 
selection-by-selection combinations. One speculation is that 
differential selection among the populations reduces the av- 
erage F st -correlation block size. For instance, selection in- 
tensities at given sites in the central populations are different 
from those in the marginal populations. Consequently, this 
changes the distribution pattern of F a -correlation blocks 
between the central and marginal populations. For a pair of 
linked selective-by-neutral sites, genetic hitchhiking and/or 
selective sweep effects increase transient F st -correlations. 
This case becomes even more complex when multiple selec- 
tive sites are involved in changing a commonly linked neu- 
tral site [50, 51]. 

For a pair of linked neutral-by-neutral sites, genetic drift 
and migration help to maintain transient F st -correlations but 
they are different in process. Although genetic drift can bring 
about the whole genome changes, the difference in effective 
population size among populations can reduce the F st - 
correlation block sizes on average and change their distribu- 
tion along chromosomes. This can be implied from the em- 
pirical observations of small LD blocks in the derived popu- 
lations owing to the different demographic histories, such as 
the founder effects [52]. The transient LD initially generated 
by genetic drift gradually decays with time owing to recom- 
bination [42]. This is also the same case for the change of 
transient F a correlations for a pair of linked neutral sites. 
Unlike the effects of genetic drift, LD generated by migra- 
tion could be maintained as long as the inter-population mi- 
gration takes place [19, 50]. Thus, on average, a large F sl - 
correlation block might increase although F a 's at individual 
sites decrease as the migration rate increases [53]. 

Similar to the effects of migration, neutral mutation re- 
duces population differentiation (e.g., F a = (l + 2N (tol) (m+v)) "' 

under the classical infinite island model, v is the mutation 
rate [53]). This facilitates genomic convergence among 
populations and increases the F sl -correlation block sizes. 
However, this may not be the case for selective sites where 
mutants favorable to different habitats increase F a [54] and 
produce different associations with linked sites on the same 
chromosomes. The joint effects of mutation and selection 
can increase or decrease the F a -correlation block sizes, de- 
pending upon whether the joint effects are consistent across 
subpopulations or not. 

Again, the remaining challenge is to disentangle the rela- 
tive effects of different evolutionary forces from the pattern 
of F st -correlations. Migration and genetic drift help to in- 
crease the average size of F a -correlation block but selection 
and mutation facilitate to produce the pattern of various 



62 Current Genomics, 2011, Vol. 12, No. 1 



Hu et al. 



block sizes. These results vary with the structure and history 
of populations. 

Methods for Measuring Inter-Site F st -Correlations 

Several software packages are currently available to es- 
timate F sl at individual sites, but estimation of F sl - 
correlation has not been fully developed [25]. Here, we dis- 
cuss the application of the method developed by Cockerham 
and Weir [55] for estimating F st -correlation. Consider popu- 
lation genomic datasets where all sampled individuals are 
sequenced as in genomes of human and cattle populations 
that are publically available. Pairs of alleles at each of two 
linked sites fall into two genie hierarchical levels: alleles in 
different individuals in the same subpopulation and alleles in 
different subpopulations in the same population. Let x m be 

the indicator variable, where i indicates the location of the 
allele, k and I are the alleles at the first and second site, re- 
spectively. When the alleles are k and / at the first and sec- 
ond sites, respectively, x m -l; which otherwise equals zero, 

x jkr -0 (k'^k , or I'^l, or both). The expectation of X m 
across all subpopulations is E{x m ) = p kl where p u is the 
gametic frequency. The variance of this indicator variable 
follows a binomial distribution, E(x* u ) -(E(x jkl )) 2 = 

Let #.,,,„ be the correlation between x„ and x,,, , 6L,,,be 

ii (kl) iki ikl ' u (k) 

the correlation between x ik and x, k at the first site, and 9 iiV) 
be the correlation between x a and x n at the second site. 
0.., {k) and 0.., m can be estimated using the analysis of vari- 
ances (ANOVA). Using the same notation as Cockerham 
and Weir [55], let Q n = (L where n is set as 1 when i and i 
are from the same subpopulation ( i = i' ), and n=2 when i 
and i are from different subpopulations. The expectation of 
a pair of alleles each from different sites can be expressed as 

E ( x u x m ) = p 2 u + 9 inkI)Pkl (l - Pkl ). (4) 

The correlation at two sites 9. f can be further decomposed 

as 

Ow m ^ e u'(^uv) + cov ( e uw e uv))- ( 5 ) 
The F sl -correlation can be calculated by cov(6 1(k) ,6 lfl) ) I 

I \l/2 

[var(9 1(k) )var(9 l(l) )J where var(9 m ) and var(0 1(;) ) can be 

estimated using conventional methods, such as bootstrap- 
ping. 

To employ Cockerham and Weir's [55] method for 

estimating 0 Vkl) ,Q n =^^£ j E(x M x m ) can be expressed as 

k i 

Q„=q+o nW {i-q), (6) 

where q-^^Pti ■ Here the correlation 9 is a constant 

k I 

for the two given sites. Eq. (6) has the same form as Cocker- 



ham and Weir ([55], p. 8512). Only two-level hierarchy com- 
ponents are considered: variance within subpopulations ( o] ) 

and variance among subpopulations ( a\ ), where 
of=i-a=fi-^ H) Xl-«A a\=e m) (\-q) and e im =a 2 2 /(a, 2 +a 2 2 ) ■ 
Once 9 m) is available using ANOVA [25, pp. 171-176], 
cov(0 1(t) ,0 1(() ) can be estimated from Eq. (5). 

Since F sl calculation is related to heterozygosities in the 
subpopulations and global population, F sl -correlation is re- 
lated to the correlation of heterozygosities at the two levels. 
Wright [53] showed that = Q.-FJQ.-FJ from 

which we can show that the F, -correlation is related to the 
correlation of heterozygosities at the global ( H it ) and local 
( H is ) levels. For a two linked sites (i and J), we can obtain 

cov(H iti ,H itj ) = cov< II.. II. ) + cov(F sti ,F stj )-A, (7) 

where A = cov(tf „. , F sl . H iSj ) + CO y(H is . , F st H iSi ) + cov(F s(j , F„ . F iSj ) 

+ cov(F rt; , F stj F iSj ) - cov(F s(; F is , , F stj F iSj ) . The inter-site F u 

co-variance is related to the inter-site co-variance of het- 
erozygosities at the global and local levels. This also implies 
that the inter-site F a co-variance is ultimately associated 
with the gametic LD at the global and local levels. 

Local and Global Gametic LDs 

The difference between inter-site heterozygosity correla- 
tions at the global and local levels is related to inter-site F st - 
correlation. If population differentiation is absent, inter-site 
correlation of heterozygosities should be equal at the two 
levels. Thus, the inter-site F st -correlation can be perceived 
from the change of glocal and local gametic LDs since zy- 
gotic associations are the function of gametic LD [27]. Here, 
we briefly discuss the global and local LDs in structured 
populations that indirectly affect the F a correlation and its 
distribution. 

The amounts of global and local LDs are different due to 
unequal rates of decay. This facilitates the divergence be- 
tween the pattern of correlation blocks within the whole 
population (e.g., the pattern of LD blocks or H e correlation 
blocks) and the pattern of F a -correlation blocks among sub- 
populations. For instance, we may compare the collapse of 
two transient LDs by synthesizing the results of Wright [56] 
and Hill and Robertson [42] in a purely neutral process. 
Suppose that a population is subdivided into n subpopula- 
tions each with the same constant effective size iV, (local) ■ Ran- 
dom sampling acts independently on individual subpopula- 
tions. Consider two diallelic linked neutral sites with the 
recombination rate r between them. Assume that all sub- 
populations begin from the same allele frequencies as in the 
entire population. Let D 0 be the initial gametic linkage 
disequilibrium in the global population or in any initial sub- 
population. According to Wright [53], population differentia- 
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tion F am at each neutral site at generation t can be expressed 
as 



1- 



1-- 
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27V 



f (local) 



(8) 



From Hill and Robertson [42], the expected LD in each 
subpopulation at generation f, £(D local(() ) , is expressed as 



E(D hcal(l) ) = (l-r) 
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Let iV, (globlll)w be the effective global population size at 
generation t. From Wright [56], N Hg)Mm can be expressed as 
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Let D , 



be the expected global LD. From Hill and 
Robertson's [42] and Eq. (10), we obtained £(D gloM(/) ) : 
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Combining (8), (9), and (11) yields 
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Fig. (6) shows that the local LD reduces more rapidly 
with time than the global LD as F st increases in a purely 
neutral process. This is because population differentiation 
increases the effective global population size in a pure neu- 
tral process, which in turn reduces the genetic drift and en- 
hances the global LD. 

In the presence of other evolutionary forces, such as the 
change of local LD by the joint effects of interpopulation 
gene flow and natural selection [50], the relationships could 
be biased from the expectation under the neutral process. 
The relationship between N e(glM) and N etlml) becomes 

more complex in the presence of natural selection: 
N^^nN^l+VXl-FJ + lN^Fjy (V is the 

variance in fitness among subpopulations) for the selective 
sites [57]. Also, population differentiation for plant species 

becomes F a ={l + 2N e(lllcalj mn /(n-1) 2 ) ' (n is the number of 

subpopulations; the migration rate m has different forms for 
alleles with different modes of inheritance in plants) for neu- 
tral sites [58, 59]. All these scenarios can change the global 
LD. The global genetic drift for the joint neutral sites is not 



the same as that for the joint selective sites. Similarly, the 
global LD affecting the joint neutral sites is not the same as 
that affecting the joint selective sites. An intermediate situa- 
tion is the transient global LDs between the selective and 
neutral sites since genetic hitchhiking modifies their LDs and 
the LDs in local populations. These different scenarios can 
affect F sl correlation blocks and their distribution along the 
chromosomes. 
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Fig. (6). A comparison between the global and local LDs in a pure 
genetic drift process. The results indicate that the expected global 
LD at generation f, E{D, , tob ., 1(I) ) , is greater than the expected local 

LD, £(D u(0 ) . Results are calculated based on Eqs. (8) ~ (12), 

with the effective size of each subpopulation 2V t(locll) = 20 and the 

number of subpopulations rc=50. 

Density Distribution of F, -Correlations 

There are few empirical studies on the density distribu- 
tion of F a -correlation blocks. Some reports are available 

about the density distribution for individual F a ' s [60]. In a 
neutral process, genetic drift and recombination gradually 
erode F a -correlations while migration increases F a - 
correlation. This eventually leads to a steady-state distribu- 
tion in F a [53] and F a -correlation. The non-random distri- 
bution of recombination along chromosomes facilitates the 
generation of different F a -correlation blocks. A shorter dis- 
tance has correspondingly, a lower recombination rate and 
helps to maintain smaller haplotypic blocks. Compared with 
the gametic LD, F a -correlation (higher-order) is also 
weaker. It is contemplated that there are a larger number of 
small F a -correlation blocks and a few large blocks, display- 
ing a highly skew distribution. 

Selection can modify the distribution of F a -correlation 
blocks. If one block contains only one selective site (e.g., 
adaptive QTN) together with many neutral sites, the distribu- 
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tion of the sizes of F sl correlation blocks on the whole re- 
flects the distribution of the effects of selective sites along 
the chromosomes. The site with a large selective intensity or 
gene effect is expected to have a large size of F sl correlation 
block due to the effects of genetic hitchhiking. The number 
of blocks is likely equal to the number of F ff -outliers [54]. If 
the effects of all selective sites follow a gamma distribution 
[61, 62], it is hypothesized that the size of F sl correlation 
block may likely follow the same kind of distribution. When 
multiple selective sites are involved in the individual F a cor- 
relation blocks, the number of F sl correlation blocks is une- 
qual to the number of selective sites. The distribution of 
block size likely exhibits the type other than the negative 
exponential distribution. This requires further empirical tests. 

When other genetic statistics, such as Nei's distance, are 
used to describe the population genetic differentiation, dif- 
ferent block sizes and distribution patterns could be pro- 
duced on the same chromosome. The sensitivity to popula- 
tion differentiation at individual sites has not been compared 
among F a and other statistics. Differential sensitivities to 
natural selection and genetic hitchhiking effects can influ- 
ence the sizes of correlation block and their distribution pat- 
terns for a given array of subpopulations. 

GENOME ARCHITECTURE AND CORRELATION 
BLOCKS 

Eukaryotic genome assembly has some explicit features, 
such as the presence of multigene families and transposable 
elements (TE). These features could affect the size and dis- 
tribution of correlation blocks within and between popula- 
tions. Here, we separately discuss these potential effects, 
including the effects of multigene families, TE, and sequence 
repeats. In each case, we begin by discussing the effects of 
these features on the correlation blocks within population 
(gametic or zygotic associations), followed by their effects 
on F st -correlation blocks among populations. 

Effects of Multigene Families 

Multigene families account for some percentages of the 
whole genome. Consider the correlation blocks within sub- 
populations in terms of gene family. Empirical studies on the 
relation between inter-site associations and multigene fami- 
lies are not available, but the density distribution of 
multigene family size has been reviewed in model organisms 
[17]. One conjecture is that multigene families could shape 
the correlation blocks and their distribution in two ways or in 
their mixture. One is that each family member can form one 
or more correlation blocks. The other is that partial segments 
of each family member are involved in the correlation 
blocks. 

Correlation blocks can be altered by the processes that 
generate and maintain multigene families. Gene conversion 
and unequal crossing-over are the common processes al- 
though others for concerted evolution have been proposed 
[63, 64]. A biased gene conversion driven by natural selec- 



tion can accelerate the homogeneity among the family mem- 
bers. When the evolution of multigene families is in the 
steady state, individual correlation blocks in terms of family 
member are likely similar in size even if the number of 
members varies among the individuals. When the evolution 
of multigene families remains in a transient state, a variant 
repeat does not completely spread to all other family mem- 
bers, and the sizes of correlation blocks could vary substan- 
tially among the family members. Similar outcomes can be 
expected when the multigene families change through une- 
qual crossing-over. Theoretical studies have shown that the 
probability of identical multigene family members exponen- 
tially decreases with their distance along the chromosomes 
[65], implying the presence of correlated blocks among fam- 
ily members under the neutral hypothesis (mutation, genetic 
drift, intrachromosomal unequal crossing-over, and inter- 
chromosomal equal crossing-over). 

The sizes of correlation blocks are related to the structure 
and function of the multigene family and the interspersed 
coding/non-coding sequences between family members. The 
number and lengths of noncoding regions within each family 
member affect the genetic divergence among members ow- 
ing to the different mutation rates between coding and non- 
coding regions. Consequently, this acts as a biological bar- 
rier to the spread of advantageous variants to all other mem- 
bers through unequal crossing-over and modifies the distri- 
bution of correlation blocks. When unequal selection intensi- 
ties exist among the interspersed segments, the size of corre- 
lation block in terms of family member should change. 
When the interspersed sequences are the solely noncoding 
sequences, an explicit separation between the individual 
blocks is expected. 

With a reference to the F a -correlation blocks in terms of 
family member, distinct selection facilitates gene conversion 
or unequal crossing-over. However, the spread of locally 
adaptive variants to other members might not be at the same 
speed among populations. As a result, the sizes of F a - 
correlation blocks may vary with the family members. 

The exchange of genomes among populations acts as a 
biological barrier to the spread of locally adaptive variants 
among family members when variants in the migrating ge- 
nomes are maladaptive to the recipient populations, similar 
to the presence of migration loads-the reduction of popula- 
tion fitness due to maladaptive immigrants [66, 67]. Recom- 
bination of immigrated maladaptive variants with resident 
genomes via a certain mating system reduces the mean fit- 
ness in recipient populations. However, genome replacement 
of the local populations can be accelerated when all the 
members or the majority of members of the multigene family 
in the migrating genomes are more adapted to the local 
populations ([53], pp. 36-38). The spread of adaptive variants 
to all other members can increase when the rate of gene con- 
version or the rate of unequal crossing-over is high. The F sl - 
correlation blocks and their distribution in terms of 
multigene family members quickly converge among popula- 
tions, analogous to the function of gene flow in reducing 
population differentiation at a single locus. 



Correlation Blocks and Population Structure 

In theory, population differentiation can affect the corre- 
lation blocks (gametic or zygotic LD) in the global popula- 
tion in terms of family member. Fig. (7) shows how popula- 
tion structure ( F sl ) changes the identity coefficient between 
the gene family members, based on the synthesis of the re- 
sults by Wright [56] and Kimura and Ohta [65]. Results are 
calculated by substituting N in b ( = 2N/3 /(l + 4Nv) ) of Ki- 
mura and Ohta's Eq. (18), the identity coefficient between 
family members with the recombination rate, 

f(x) = e-^(2^x)]e- 2 ^' : ^-j^ dt> with N e(gMal) = nN e(local) I 

(1-F a ) under the neutral process [56]. a is the constant re- 
lated to intrachromosomal unequal crossing over. Population 
differentiation ( F a * 0 ) increases the effective global popu- 
lation size and hence facilitates the inter-chromosomal cross- 
ing-over, which in turn reduces the genetic correlation (Fig. 
7). This result implies that local population differentiation 
facilitates the divergence in the correlation block size in the 
global population. 
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Fig. (7). Effects of population differentiation on the identity coeffi- 
cient between family members in the global population. Results are 
calculated according to Kimura and Ohta [65] and Wright [56] 
under the neutral process (see the formula in main text). Parameters 
used in the figure are the number of local populations n=50, the 
effective size of local population N e (i OC ai)=50, the mutation rate per 
family member per generation v = 1(T 5 , the constant a=0.1, the 
rate of interchromosomal crossing-over per generation {S =0.001. 

7-axis represents the identity coefficient between family members 
with the recombination rate (distance) x=0.1 on a chromosome. 
Note that migration within the global population does not change 
other parameters in/(x) except the effective population size. 

Analogous to its effects on population differentiation, 
genetic drift aids to diversify the pattern of F a -correlation 
block in terms of multigene family. Populations with small 
effective sizes increase the fixation probability of the mal- 
adaptive variants [68, 69] and impede the spread of the adap- 
tive variants to all the family members through unequal 
crossing-over or gene conversion. This is in contrast to the 
outcome in populations with large effective sizes. 
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Current challenge is to decipher the relative contributions 
of different evolutionary forces in maintaining the multigene 
family [70, 71]. It is necessary to develop methods that 
evaluate the observed pattern of correlation blocks in terms 
of multigene family to better understand the underlying evo- 
lutionary processes. This is feasible for species whose family 
members can be mapped from their whole genome se- 
quences to enable to the analyses of their correlation blocks. 

Perturbation from Transposable Elements 

The processes for maintaining the number of TE copies 
in a population are complex [72-76]. The effects of transpo- 
sition on the host genomes are associated with the intensities 
of selection on (i) the transposable elements themselves 
(positive or negative) and (ii) the modified host sequences. A 
positive effect facilitates the spread of TE in a population 
until other forces such as genetic drift counteract their repli- 
cation [77]. The number of TE copies does not increase infi- 
nitely although the number of potential sites for transposition 
is sufficiently large [73, 76]. When negative effects are act- 
ing on the host genomes, such as insertion into the coding 
regions, the abundance of TE is maintained by the balance 
between selection and replication [72, 73]. When the selec- 
tion intensity is of the order similar to the effect of genetic 
drift, the mechanism of replication-drift cannot be excluded. 

Empirical studies demonstrate that TE can be sources of 
variation via its insertion into different regions of a gene, 
such as in exons, introns, and regulatory regions of host 
genes (see review by Kidwell and Lisch [74]). The perturba- 
tion from TE on the correlation blocks within subpopulations 
is likely related to how and where the transposition has oc- 
curred on the host genomes. When neutral TE are inserted 
into the non-coding regions that are adjacent to the selective 
sites [74], the original correlation blocks likely expand or 
become more separated due to the extension of neutral seg- 
ments and the effects of genetic hitchhiking. In contrast, 
when neutral TE are inserted into the adaptive coding re- 
gions [74], the original correlation blocks likely break into 
smaller blocks and their number increases. When selective 
TE are inserted into the non-coding regions, new blocks 
likely arise and their block sizes are related to the strength of 
selection against the TE due to genetic hitchhiking effects 
[78]. When selective TE are inserted into the coding regions, 
the original block sizes could change to various degrees and 
this probably depends on how far the TE are located away 
from the original selective outliers. These conjectures sug- 
gest that a complex relation might exist between the effects 
of TE and the pattern of correlation blocks. 

Similarly, a complex relationship might exist between the 
effects of TE and F a -correlation blocks. Studies have shown 
population differentiation for TE under genetic drift, muta- 
tion and other forces [79-81]. The differential selection 
against the same TE facilitates unequal TE abundances 
among the populations. In addition, the difference in the ef- 
fective population sizes enhances to generate unequal genetic 
drift effects on the spread of TE among the individuals. Like 
the existence of finite number of TE in a subpopulation, the 
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joint effects of multiple forces (e.g., selection and genetic 
drift) on the spread of TEs eventually will lead to a finite 
number of F a -correlation blocks. 

Migration and demography history can affect the dynam- 
ics of TE within the genomes and structured populations 
[81]. The distribution of TE copy number among subpopula- 
tions can be modified by the relative migration rate, transpo- 
sition rate, and the strength of selection against the deleteri- 
ous effects of TE. The homogenization process for the TE 
copy number due to migration may likely take a long time in 
structured populations. Similarly, inter-population migration 
homogenizes the perturbation effects on F a -correlation 
blocks. One likely consequence is that migrating genomes 
could change different TE copy numbers and hence the 
number of blocks in the recipient populations when TE are 
neutral or nearly neutral under the infinite- allele model. This 
is analogous to the increase in the rare allele richness (or rare 
species richness) due to the effects of immigration under the 
infinite- allele model (or infinite-species model) in molecular 
population genetics (or in neutral community ecology) [76, 
82-84]. The other likely scenario is that immigrating TE can 
cause migration loads when the migrating TE are maladap- 
tive in the recipient populations. This consequently alters the 
pattern of F st -correlation blocks. The above analyses suggest 
that a very complex pattern of F st -correlation blocks might 
occur under the joint effects of migration with other forces. 

Population differentiation can affect the distribution of 
TE abundance in the global population, and this subse- 
quently affects the correlation blocks within and between 
subpopulations. Population differentiation can increase the 
number of transposable sites for those TE with low frequen- 
cies in the global population under the neutral process (Fig. 
8A). These were calculated by substituting N in 
0(=4iV e(gloW) v) of Eq. (2) of Ohta [76], 

G{x) = n TE G(l - *)*-' x- , by N e(gIobal) = nN e(locaI) l(\-FJ under 

the neutral process [56]. G(x) is the function so that 
G(x)dx represents the number of TE transposable sites 

whose frequencies are within x ~ x+dx and the sum of the 
allelic frequencies is 1. Large population differentiation in- 
creases the effects of those TE with low frequencies on the 
correlation blocks in the global population. Population dif- 
ferentiation also facilitates the accumulation of the total 
number of existing TE (Fig. 8B). However, this can be modi- 
fied in the non-neutral process where the effective size of the 
global population reduces due to the variation in fitness 
among populations [57, 72, 73]. 

Mutation could lead to changes in the structure of TE, 
and hence affects its function on the host genomes, as im- 
plied from studies on the type of TE and their evolutionary 
relationships [70, 85]. The consensus is that favorable TE 
mutants would facilitate their spread in population which 
otherwise could be rapidly removed from their resident 
populations. The fate of new TE mutants (extinction or per- 
sistence in sub-/ global-population) could influence the cor- 
relation blocks and this awaits further research. 



The effects of TE perturbation further complicate the 
assessment of the correlation blocks within and among 
populations and their distribution. One probable way is to 
check the TE from the genome sequences of model 
organisms and to investigate their diversities within and 
among populations [45]. This helps to predict whether the 
perturbation of TE is negligible in modifying the number and 
sizes of correlation blocks. 

In general, perturbation of TE increases uncertainty in the 
size of correlation blocks, leading to a dynamic distribution 
in block number and size. Whether the effects of such per- 
turbation are linearly additive remains to be studied, but this 
uncertainty could likely be substantial, partially depending 
on the function of TE, their abundances and effects on host 
genomes. 

Effects of Nongenic Sequences and GC Isochores 

The genomic structure of eukaryote is characterized by 
abundant repetitive inter-dispersed nongenic sequences, such 
as in the pine genomes [86]. The highly repetitive sequences 
each with a few to hundreds of nucleotides aid in the forma- 
tion of correlation blocks within populations, especially 
when the highly repetitive sequences are neutral and act as 
the inter-spacers flanking correlation blocks. The repetitive 
sequences with hundreds to thousands of nucleotides facili- 
tate the formation of correlation blocks of middle sizes when 
they are selective and contain outliers, which otherwise func- 
tions as the highly repetitive sequences. Tandem repetitive 
sequences are expected to be less effective than the inter- 
spersed repetitive sequences in shaping the number and size 
of correlation blocks within populations. The single-copy 
sequences often code functional genes and contain outliers, 
such as H e and F a outliers. Empirical studies are unavailable 
to examine the relations between repetitive sequences and 
correlation blocks. 

The processes that maintain repetitive sequences (mainly 
the nongenic DNA) are complex. These include transposi- 
tion, replication slippage, unequal sister-chromatid exchange 
and inter-chromosomal unequal crossing-over [70, 87]. 
Some of these have been discussed in the preceding two sub- 
sections. The process through recombination within and be- 
tween chromosomes is affected by the recombination het- 
erogeneity along the chromosomes [29, 31]. As well, the 
spread of tandem and interspersed repetitive sequences can 
be mediated through different paths in a population. For in- 
stance, variation in the number of tandem repeats (VNTR) 
among the chromosomes implies high polymorphism among 
the individuals within populations. However, the number of 
repetitive sequences should be finite owing to the balance 
between extinction by genetic drift and the formation by 
replication (one kind of mutation), provided that the repeti- 
tive sequences are neutral. The distribution in the number 
and size of repetitive sequences among the individuals varies 
with populations of different effective sizes [88], facilitating 
the formation of distinct F st -correlation blocks. 

Migration reduces population difference in the number 
and size of repetitive sequences, given that migrating ge- 
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Fig. (8). Effects of population differentiation ( F sl ) on the distribution of transposable elements (TEs) in the global population: A. TE abun- 
dances under different frequencies; and B. Changes of the total number of existing TEs with F n . Results are calculated according to Ohta 

[76] and Wright [56] under the neutral process (see the formula in the main text). In Figure A, the values on X-axis represent the intermedi- 
ate values of fixed frequency intervals: [0.01, 0.1], [0.1, 0.2], and [0.9, 1.0]. Y-axis represents the estimated TE abundances correspond- 

x2 l 
ing to the fixed frequency intervals (= jG(x)dx )■ In Figure B, the total number of existing TEs is estimated by lG(x)dx ■ The common 

- 11 l/ZA^global) 

parameters used in both figures are the number of local populations «=30, the effective size of local population N e (i oca i)=30, the transposition 
rate v = 0.0001, and the average number of TEs per genome n TE =10. 



nomes recombine with the genomes in the recipient popula- 
tions. The presence of nongenic repeats increases the prob- 
ability of occurrence of genetic hitchhiking [89], and hence 
modifies the F st -correlation blocks. However, this condition 
infrequently occurs in the prokaryotic genomes where non- 
genic DNA is absent or accounts for a very small proportion 
of the genomes [70] . 

Another constitutional feature comes from the presence 
of GC-rich isochores that form a mosaic pattern within 
chromosomes and related to the recombination hotspots [29, 
90, 91]. Complementary to the tandem and interspersed re- 
petitive sequences that are mainly nongenic, GC-rich isocho- 
res are mainly distributed in the coding regions although the 
mechanisms for their originations is still in dispute between 
selectionists and mutationalists [70]. The pattern of correla- 
tion blocks within and between populations in terms of GC- 
rich isochores is expected to exist from the point of either 
selectionists' or mutationalists' view. Different natural selec- 
tion intensities among GC-rich isochores can result in corre- 
lation blocks of various sizes due to genetic hitchhiking ef- 
fects, as implied from human genome studies [89]. The dis- 
tribution of correlation blocks may be diverse from those in 
terms of other units (e.g., TEs or multigene families). Muta- 
tional differential among GC-rich isochores can reinforce a 
mosaic pattern of genomic diversity. Difference in effective 
population sizes or in selection intensities can result in a mo- 
saic distribution of F st -correlation blocks in terms of GC- 
rich isochores while migration tends to homogenize these 
differences. 

Perspectives 

When distinct assembly features as multigene families, 
TE, and repeats are jointly considered, the challenge is how 
to distinguish each from the observed pattern of the correla- 



tion blocks, or how to assess their relative contributions to 
this pattern. The preceding discussions suggest the complex- 
ity of the processes that maintain their dynamics. These are 
briefly summarized in Table 1. The relative contributions of 
different attributes differ among species. For example, the 
non-genic repeats probably play a more important role in 
pines but not in the prokaryotes since pine genomes contain 
a substantial amount of nongenic repeats [86]. The effects of 
TE perturbation are likely important in the genomes of hu- 
man and other mammals since a majority of their repeats are 
TE [17], For a given species, one intuitive approach to 
evaluate their relative contributions is to compare the num- 
ber and sizes of the correlation blocks by partitioning the 
total variation into the different process components and 
testing for their significance. The challenge of such an analy- 
sis is to identify the individual blocks in the presence of di- 
verse evolutionary processes. 

CONCLUDING REMARKS 

Correlation blocks and their distribution along the chro- 
mosomes are an important aspect of the structure of genomic 
diversity at the population level. Study on genomic structure 
requires data on genome-wide SNPs or markers that not until 
recently are available in a genetic studies of population struc- 
ture. The present synthetic review attempts to tie population 
structure with genomic structure by bring forth their complex 
interfaces. Our discussions address how population structure 
shapes the pattern of correlation blocks and how the evolu- 
tionary processes affect the pattern of correlation blocks. 
Methods for characterizing the pattern of correlation block, 
such as the correlation of H/s (genomic diversity structure 

within subpopulations) and the correlation of F sl 's (genomic 
diversity structure among subpopulations), have been pre- 
sented. 
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Table 1. The Evolutionary Processes and their Potential Effects on Correlation Blocks within and Among Populations 



Hu et al. 





Selection 


Mutation 


Migration 


Drift 


Multigene Family 


Facilitating the homogeneity 
between family members 
within populations. 


Selective mutation within 
family members may 
change the pattern of 
correlation blocks. 


Homogenizing the structure 
of genomic diversity 
between populations. 


Enhancing the differential 

structure of genomic 
diversity among popula- 
tions. 


Different selection strengthes 
between populations change 
F st -correlation blocks. 


Neutral mutation has no 
effects. 




Enhancing the 
heterogeneity between 
family members. 


Transposable elements 
(TEs) 


Insertion of selective TEs may 
change the original correlation 
block size 


Changing the structure 
and function of TEs, and 
hence the pattern of 
correlation blocks. 


Homogenizing TEs effects 
among populations. 


Affecting the spread of 
TEs on host genomes 


Insertion of neutral TEs into 
coding regions may change 
original block size. 




Migration of maladaptive 
TEs produces migration 
load. 


Enhancing the differences 
in diversity of genomic 
structures among 
populations 


Insertion of neutral TEs into 
noncoding regions may expand 
original correlation block size. 




Migration of neutral TEs 
enhances the number of 
small correlation blocks to 
the recipient population. 




Differential selection 
strengthes among populations 
may change F st -correlation 
blocks. 








Repetitive nongenic 
sequences, GC-isochores 


Enhancing the probability 
genetic hitchhiking effects. 


GC-isochore mutation 
enhances a mosaic pattern 
of correlation blocks. 


Reducing the number and 
size of repetitive 
sequences. 


The abundance of repeats 
controlled by replication 
and drift. 


Distinct selection strengthes 

among GC-isochores 
enhance different patterns of 
correlation blocks. 




Homogenizing the different 
patterns produced by 
GC-isochores among 
populations. 


Different N e 's enhance 
different patterns of 
correlation blocks in terms 
of GC-isochores 


Distinct selection strengthes 
on GC-isochores among 

populations change 
F st -correlation blocks. 









The consensus is that correlation blocks of various sizes 
do exist, and their numbers and sizes will diminish as SNP 
maps become progressively denser as in the case of haplo- 
type block size in human genomes. With the availability of 
population genomic data in many species, it has become in- 
creasing important to quantify and characterize the amount, 
distribution and pattern of correlation block at the population 
level. This provides a population-based genome-wide per- 
spective when developing strategies in conservation biology, 
given that the number of correlation blocks is analogous to 
the effective number of "super sites" (removing the redun- 
dant information from correlated diversities within each 
block). In the eukaryotic genomes, the distribution pattern of 
correlation blocks is associated with the genomic assembly 
features. Multigene family, non-genic repetitive sequences 
and GC-rich isochores may reinforce the pattern of correla- 
tion blocks. Perturbation from transposable elements in- 



creases the uncertainty of this distribution pattern in size and 
number of correlation blocks. There is a considerable oppor- 
tunity to explore and elucidate the relationships between the 
structure of genomic diversity and the evolutionary proc- 
esses. 

ACKNOWLEDGEMENTS 

We sincerely appreciate Jean Bousquet, Tony Brown, 
and Xuhua Xia and two reviewers for valuable comments on 
earlier versions of this article. The work is supported by Al- 
berta Livestock Industry Development Fund Ltd (ALIDF) 
and Alberta Agricultural Research Institute (AARI). 

REFERENCES 

[1] Barrett, J.C.; Fry, B.; Mailer, J.; Daly, M.J. Haploview: analysis 
and visualization of LD and haplotype maps. Bioinformatics, 2005, 
21, 263-265. 



Correlation Blocks and Population Structure 



Current Genomics, 2011, Vol. 12, No. 1 69 



[2] Slatkin, M. Linkage disequilibrium-understanding the evolutionary 
past and mapping the medical future. Nat. Rev. Genet., 2008, 9, 
477-485. 

[3] Tomita, M.; Hatsumich, M.; Kurihara, K. Identify LD blocks based 
on hierarchical spatial data. Com. Stat. Data Analysis, 2008, 52, 
1806-1820. 

[41 Hyten, D.L.; Choi, I.-Y.; Song, R.; Schoemaker, R.C.; Nelson, 
R.L.; Costa, J.M.; Specht, J.E.; Cregan, PB. Highly variable pat- 
tern of linkage disequilibrium in multiple soybean populations. Ge- 
netics, 2007, 1 75, 1937-1944. 

[5] Mackay, T.F.C. Genetic dissection of quantitative traits. In: Singh, 
R.S., Uyenoyama, M.K., (Eds). The Evolution of Population Biol- 
ogy. Cambridge University Press, 2004, pp. 51-73. 

[6] Gabriel, S.B.; Schaffner, S.F.; Nguyen, H.; Moore, J.M.; Roy, J.; 
Blumenstie, B.; Higgins, J.; DeFelice, M.; Lochner, A.; Faggart, 
M; Coedero, S.N.L.; Rotimi, C; Adeyemo, A.; Cooper, R.; Ward, 
R.; Lauder, E.S.; Daly, M.J.; Alshuler, D. The structure of haplo- 
type blocks in human genome. Science, 2002, 296, 2225-2229. 

[7] Altshuler, D.; Brooks, L.D.; Chakravarti, A.; Collins, F.S.; Daly, 
M.J.; Donnelly, P.; International HapMap Consortium. Ahaplotype 
map of the human genome. Nature, 2005, 437, 1299-320. 

[8] Ohta, T. Linkage disequilibrium with the island model. Genetics, 
1982, 707, 139-155. 

[9] Luikart, G.; England, PR.; Tallmon, D.; Jordan, S.; Taberlet, P. 
The power and promise of population genomics: from genotyping 
to genome typing. Nat. Rev. Genet., 2003, 4, 981-994. 

[10] Gibbs, J.R.; Singleton, A. Application of genome-wide single nu- 
cleotide polymorphism typing: simple association and beyond. Plos 
Genet, 2006, 2, el50. 

[11] Stinchcombe, J.R.; Hoekstra, H.E. Combining population genomics 
and quantitative genetics: finding the genes underlying ecologically 
important traits. Heredity, 2008, 100, 158-170. 

[12] Black, I. V.; W.C.; Baer, C.F.; Antolin, M.F.; Nancy, M.; DuTeau, 
N.M. Population genomics: genome-wide sampling of insect popu- 
lations. Ann. Rev. Ent., 2001, 46, 441-469. 

[13] Goldstein, D.; Weale, M.E. Population genomics: linkage dise- 
quilibrium holds the key. Curr. Biol, 2001, 11, R576-R579. 

[14] Wall, J.D.; Pritchard, J.K. Haplotype blocks and linkage disequilib- 
rium in the human genome. Nat. Rev. Genet., 2003, 4, 587-597. 

[15] Thornton, K.R.; Jensen, J.D.; Becquet, C; Andolfatto, P. Progress 
and prospects in mapping recent selection in the genome. Heredity, 
2007, 98, 340-348. 

[16] Charlesworth, B. Recombination, genome size and chromosome 
number. In: The evolution of genome size. By T, Ed. Cavalier- 
Smith, John Wiley & Sons, Chichester, 1985, pp. 489-513. 

[17] Lynch, M. The origins of genome architecture. Sinauer Associates, 
Inc. Publisher, Sunderland, Massachusetts, 2007. 

[18] Charlesworth, D. Balancing selection and its effects on sequences 
in nearby genome regions. Plos Genet., 2006, 2, e64. 

[19] Li, W.H.; Nei, M. Stable linkage disequilibrium without epistasis in 
subdivided populations. Theor. Popul. Biol., 1974, 6, 173-183. 

[20] Slatkin, M. Gene flow and selection in a two-locus system. Genet- 
ics, 1975,57,787-802. 

[21] Hu, X.S. Barriers to the spread of neutral alleles in the cytonuclear 
system. Evolution, 2008, 62, 2260-2278. 

[22] Charlesworth, B.; Morgan, M. T.; Charlesworth, D. The effects of 
deleterious mutations on neutral molecular variation. Genetics, 
1993, 134, 1289-1303. 

[23] Maynard Smith, J.; Haigh, J. The hitch-hiking effect of a favorable 
gene. Genet. Res., 1974, 23, 23-35. 

[24] Hill, W.G. Disequilibrium among several linked neutral genes in 
finite populations. I. Mean changes in disequilibrium. Theor. 
Popul. Biol, 1974, 5, 366-392. 

[25] Weir, B.S. Genetic Data Analysis II. Sinauer Associates, 1996. 

[26] Eberle, M.A.; Rieder, M.J.; Kruglyak, L.; Nickerson, D.A. Allele 
frequency matching between SNPs reveals an excess of linkage 
disequilibrium in genie regions of the human genome. Plos Genet., 
2006,2,1319-1327. 

[27] Yang, R.C. Analysis of multilocus zygotic associations. Genetics, 
2002,767,435-445. 

[28] Hill, W.G.; Robertson, A. The effect of linkage on limits to artifi- 
cial selection. Genet. Res. ,1966, 8, 269-294. 

[29] Spencer, C.C.A.; Deloukas, P.; Hunt, S.; Mullikin, J.; Myers, S.; 
Silverman, B.; Donnelly, P.; Bentley, D.; McVean, G. The influ- 
ence of recombination on human genetic diversity. Plos Genet., 
2006,2,1375-1385. 



[30] Myers, S.; Bottolo, L.; Freeman, C; McVean, G.; Donnelly, P. A 
fine-scale map of recombination rates and hotspots across the hu- 
man genome. Science, 2005, 310, 321-324. 

[31] Coop, G.; Prezeworski, M. An evolutionary view of human recom- 
bination. Nat. Rev. Genet, 2007, 8, 23-34. 

[32] Morton, N.E.; Simpson, S.P. Kinship mapping of multilocus sys- 
tems. Hum. Genet, 1983, 64, 103-104. 

[33] Hu, X.S. Estimating the correlation of pairwise relatedness along 
chromosomes. Heredity, 2005, 94, 338-346. 

[34] Hu, X.S.; Wang, Z. Estimating the correlation of non-allele de- 
scents along chromosomes. Genet. Res., 2010, 92 (in press). 

[35] Hernandez-Sanchez, J.; Haley, C.S.; Woolliams, J.A. On the pre- 
diction of simultaneous inbreeding coefficients at multiple loci. 
Genet. Res., 2004, 83, 113-120. 

[36] Hill, W.G.; Weir, B.S. Prediction of multi-locus inbreeding coeffi- 
cients and relation to linkage disequilibrium in random mating 
populations. Theor. Popul. Biol, 2007, 72, 179-185. 

[37] Brown, G.R.; Gill, G.P.; Kuntz, R.J.; Langley, C.H.; Neale, D.B. 
Nucleotide diversity and linkage disequilibrium in loblolly pine. 
Proc. Natl. Acad. Sci. USA, 2004, 707, 15255-15260. 

[38] Sved, J.A. Linkage disequilibrium and homozygosity of chromo- 
some segments in finite population. Theor. Popul. Biol, 1971, 2, 
125-141. 

[39] Cockerham, C.C.; Weir, B.S. Descent measures for two loci with 
some applications. Theor. Popul Biol, 1973, 4, 300-330. 

[40] Robertson, A. The effect of inbreeding on the variation due to 
recessive genes. Genetics, 1952, 37, 189-207. 

[41] Ohta, T.; Kimura, M. Linkage disequilibrium due to random ge- 
netic drift. Genet. Res., 1969, 13, 47-55. 

[42] Hill, W.G.; Robertson, A. Linkage disequilibrium in finite popula- 
tions. Theor. Appl. Genet., 1968, 38, 226-231. 

[43] Percus, J.K. Mathematics of Genome Analysis. Camb. Univ. Press, 
2002. 

[44] Hahn, M.W. Accurate inference and estimation in population ge- 
nomics. Mol. Biol. Evol, 2006, 23, 911-918. 

[45] Begun, D.J.; Holloway, A. K.; Stevens, K.; Hillier, L.W.; Poh, 
Y.P; Hahn, M.W.; Nista, P.M.; Jones, CD.; Kern, A.D.; Dewey, 

C. N.; Pachter, L.; Myers, E.; Langley, C.H. Population genomics: 
whole-genome analysis of polymorphism and divergence in Droso- 
phila simulans. Plos Biol, 2007, 5, e310 

[46] Lewontin, R.C. The interaction of selection and linkage I. general 
considerations: heterotic models. Genetics, 1964, 49, 49-67. 

[47] Nei, M. Genetic distance between populations. Am. Nat, 1972, 
706, 283-292. 

[48] Akey, J.M.; Zhang, G.; Zhang, K.; Jin, L.; Shriver, M.D. Interro- 
gating a high-density SNP map for signatures of natural selection. 
Genome Res., 2002, 72, 1805-1814. 

[49] MacEachern, S.; Hayes, B.; McEwan, J.; Goddard, M. An 
examination of positive selection and changing effective population 
size in Angus and Holstein cattle populations (Bos taurus) using a 
high density SNP genotyping platform and the contribution of 
ancient polymorphism to genomic diversity in Domestic cattle. 
BMC Genomics, 2009, 70,181. 

[50] Hu, X.S.; He, F.L. Background selection and population differen- 
tiation. J. Theor. Biol, 2005, 235, 207-219. 

[51] Hu, X.S. Fst in the cytonuclear system. Theor. Popul. Biol, 2010, 
77, 105-118. 

[52] Reich, D.E.; Cargill, M.; Bolk, S.; Ireland, J.; Sabeti, PC; Richter, 

D. J.; Lavery, T.; Kouyoumjian, R.; Farhadian, S.F.; Ward, R.; 
Lander, E.S. Linkage disequilibrium in the human genome. Nature, 
2001,477,199-204. 

[53] Wright, S. Evolution and the Genetics of Populations. Vol. 2. The 
Theory of Gene Frequencies. The University of Chicago Press, 
Chicago, 1969. 

[54] Merila, J.; Crnokrak, P. Comaprison of genetic differentiation at 
marker loci and quantitative traits. J. Evol. Biol, 2001, 14, 892- 
903. 

[55] Cockerham, C.C.; Weir, B.S. Correlation, descent measures: drift 
with migration and mutation. Proc. Natl. Acad. USA, 1987, 84, 
8512-8514. 

[56] Wright, S. Isolation by distance. Genetics, 1943, 28, 114-138. 

[57] Whitlock, M.; Barton, N.H. The effective population size with 
migration and extinction. Genetics, 1997, 746, 427-441. 

[58] Hu, X.S.; Ennos, R.A. Impacts of seed and pollen flow on popula- 
tion genetic structure for plant genomes with three contrasting 
modes of inheritance. Genetics, 1999, 752, 441-450. 



70 Current Genomics, 2011, Vol. 12, No. 1 

[59] Hu, X.S. A preliminary approach to the theory of geographical 
gene genealogy for plant genomes with three different modes of in- 
heritance and its application. Act. Genet. Sin., 2000, 27, 440-448. 

[60] Kitada, S.; Kitakado, T.; Kishino, H. Empirical Bayes inference of 
pairwise F ST and Its Distribution in the genome. Genetics, 2007, 
777,861-873. 

[61] Hill, W.G. Predications of response to artificial selection from new 

mutations. Genet. Res., 1982, 40, 255-278. 
[62] Hu, X.S.; Li. B. Additive genetic variation and the distribution of 

QTN effects among sites. Theor. Biol., 2006, 243, 76-85. 
[63] Walsh, J.B. Interaction of selection and biased gene conversion in a 

multigene family. Proc. Natl. Acad. USA, 1985, 82, 153-157. 
[64] Drouin, G.; Prat, R; Ell, M.; Paul-Clarke, G.D. Detecting and char- 
acterizing gene conversions between multigene family members. 

Mol. Biol. Evol, 1999, 16, 1369-1390. 
[65] Kimura, M.; Ohta, T. Population genetics of multigene family with 

special reference to decrease of genetic correlation with distance 

between gene members on a chromsome. Proc. Natl. Acad. Sci. 

USA, 1983, 76, 4001-4005. 
[66] Hu, X.S.; Li, B. On the migration load of seeds and pollen grains in 

a local population. Heredity, 2003, 90, 162-168. 
[67] Hu, X.S. Migration load in males and females. Theor. Popul. Biol., 

2006, 70, 183-200. 

[68] Wright, S. Evolution in Mendelian populations. Genetics, 1931, 16, 
97-159. 

[69] Kimura, M. Diffusion models in population genetics. J. Appl. 

Prob., 1964, 1, 177-232. 
[70] Li, W.H. Molecular Evolution. Sinauer Associates, Inc., Pub., 

Sunderland, 1997. 

[71] Demuth, J.P; Bie, T.D.; Stajich, J.E.; Cristianini, N.; Hahn, M.W. 

The evolution of mammalian gene families. Plos One, 2006, 1, e85. 
[72] Brookfield, J.F.Y.; Badge, R.M. Population genetics models of 

transposable elements. Genetica, 1997, 100, 281-294. 
[73] Charlesworth, B.; Charlesworth, D. The population dynamics of 

transposable elements. Genet. Res., 1983, 42, 1-27. 
[74] Kidwell, M.G.; Lish, D.R. Transposable elements as sources of 

variation in animals and plants. Proc. Natl. Acad. Sci. USA, 1997, 

94, 7704-7711. 

[75] Langley, C.H.; Brookfield, J.F. Y.; Kaplan, M.L. Transposable 
elements in Medelian populations. I. A theory. Genetics, 1983, 104, 
457-480. 

[76] Ohta, T. Population genetics of transposable elements. J. Math. 
Appl. Med. Biol., 1984, 1, 17-29. 



Hu et al. 

[77] Agrawal, A.; Eastman, Q.M.; Schatz, D.G. Implications of transpo- 
sition mediated by V(D)J-recombination proteins RAG1 and RAG2 
for origins of antigen-specific immunity. Nature, 1998, 39, 8-23. 

[78] Barton, N.H. Genetic hitchhiking. Philos. Trans. R. Soc. Lond. B, 
2000, 355, 1553-1562. 

[79] Slatkin, M. Genetic differentiation of transposable elements under 
mutation and unbiased gene conversion. Genetics, 1985, 110, 145- 
158. 

[80] Escobar-Paramo, P.; Ghosh, S.; DiRuggiero, J. Evidence for ge- 
netic drift in the diversification of a geographically isolated popula- 
tion of the hyperthermophilic Archaeon pyrococcus. Mol. Biol. 
Evol, 2005, 22, 2297-2203. 

[81] Deceliere, G.; Charles, S.; Biemont, C. The dynamics of transposa- 
ble elements in structured populations. Genetics, 2005, 169, 467- 
474. 

[82] Kimura, M. The Neutral Theory of Molecular Evolution. Cam- 
bridge University Press, Cambridge, 1983. 

[83] Hu, X.S.; He, F.L.; Hubbell, S.P Neutral theory in macroecology 
and population genetics. Oikos, 2006, 113, 548-556. 

[84] Venner, S.; Feschotte, C; Biemont, C. Dynamics of transposable 
elements: towards a community ecology of the genome. Trends 
Genet., 2009, 25, 317-323. 

[85] Feschotte, C; Jiang, N.; Wessler, S.R. Plant transposable elements: 
where genetics meets genomics. Nat. Rev. Genet., 2002, 3, 329- 
341. 

[86] Morse, A.M.; Peterson, D.G.;Islam-Faridi, M. N.; Smith, K.E.; 

Magbanua, Z.; Garcia, S. A.; Kubisiak, T. L.; Amerson, H.V.; 

Carlson, J.E.;Nelson, CD.; Davis, J.M. Evolution of genome size 

and complexity in Pinus. Plos One, 2009, 4, e4332. 
[87] Belshaw, R.; Bensasson, D. The rise and falls of introns. Heredity, 

2006,96,208-213. 

[88] Lynch, M.; Conery, J.S. The origins of genome complexity. 
Science, 2003, 302, 1401-1404. 

[89] Cai, J.J.; Macpherson, J.M.; Selle, G.; Petrov, D.A. Pervasive 
hitchhiking at coding and regulatory sites in humans. Plos Genet., 
2009, 5, el000336. 

[90] Aerts, S.; Thijs, G.; Dabrowski, M.; Moreau, Y.; and Moor, B.D. 
Comprehensive analysis of the base composition around the tran- 
scription start site in Metazoa. BMC Genomics, 2004, 5, 34. 

[91] Forsdyke, D.R. Regions of relative GC% uniformity are recombi- 
national isolators. J. Biol. Syst, 2004, 12, 261-271. 



